Assortative mating on blood type: Evidence from one million Chinese pregnancies

Significance In the human population, spousal pairs have been found to share phenotypes, which demonstrates the highly nonrandom nature of human mate choice. However, assortative mating on blood type—one of the most fundamental phenotypes in biological, medical, and psychological studies—has not been investigated. Using a unique dataset from China, we provide statistical analysis to test whether matching on blood type is nonrandom and find a set of strong evidence for assortative mating on blood type. The findings are robust after we control for the effect of other possible mechanisms, and show that the spousal concordance on blood type we observe is attributable to not only an individual’s mate opportunity but also their mate choice.


Results
We performed a series of statistical tests to explore assortative mating on blood type. We first used Pearson's chi-square test on a contingency table for spousal pairs' blood types to evaluate whether blood type influences human mate choice. Assortative mating typically refers to a mating choice pattern in which individuals with similar phenotypes mate with each other more frequently than the theoretical prediction under a random mating pattern. This definition implies that the degree of assortative mating on a certain phenotype can be measured by a chi-square test that compares the contingency table for pairs' blood

Significance
In the human population, spousal pairs have been found to share phenotypes, which demonstrates the highly nonrandom nature of human mate choice. However, assortative mating on blood type-one of the most fundamental phenotypes in biological, medical, and psychological studies-has not been investigated. Using a unique dataset from China, we provide statistical analysis to test whether matching on blood type is nonrandom and find a set of strong evidence for assortative mating on blood type. The findings are robust after we control for the effect of other possible mechanisms, and show that the spousal concordance on blood type we observe is attributable to not only an individual's mate opportunity but also their mate choice.
types with the contingency table generated by random matching. To perform this test, we used the full sample of Chinese prepregnancy checkup data and aggregated information on each spousal pair's blood types into a contingency table. The raw dataset consists of 1,137,010 couples who were followed up and became pregnant within 6 mo after the prepregnancy exam, from all 31 provincial administrative regions of Mainland China. For convenience of our later analysis, we removed observations with incomplete information related to the couple's blood types, living areas, birthplaces, ethnicity, and marital status. We obtained a sample of 931,964 couples with complete personal information to serve as our full sample. As the contingency table reports, diagonal elements have a higher frequency than expected by uniform random mating, which shows that spousal pairs with the same blood type are more likely to marry each other (Table 1). Pearson's chi-square test on the full sample (chi-square statistic: 4020.942, P-value: 0.000, degree of freedom: 9, Cramer's V: 0.038) further validates nonrandom mating.
We subsequently adopted alternative measures to evaluate the degree of assortative mating to investigate whether the conclusion obtained from the chi-square test is robust. The chi-square test shows whether we can reject the null hypothesis of random mating, but it cannot tell us the specific pattern the mating process follows if the null hypothesis is rejected. To learn whether mating choice is assortative, we are particularly interested in the diagonal elements of the contingency table. Specifically, we wanted to know whether the numbers for matching in these four specific cells are significantly different from expected under uniform random matching. To do this, we computed adjusted Pearson residuals that indicate whether the number of mating pairs in a specific cell is significantly different than expected and performed statistical tests on the statistics of diagonal elements at the = 5 % level. It shows that the four pairs of blood types-i.e., the diagonal elements of the country-level contingency table, (A, A), (B, B), (AB, AB), and (O, O)-have adjusted Pearson residuals of 21.312, 22.420, 16.846, and 59.482, respectively, all of which are significantly higher than the two-sided critical value after Bonferroni correction, ±2.498. The results suggest that individuals of all blood types tend to marry those who share the same blood type.
We also used the Altham index, which is widely used in testing the association of unordered rows and columns of an r × s contingency table, to alternatively measure the overall extent of assortative mating (23)(24)(25). The Altham index uses the odds ratio of the likelihood of matching within different blood type pairs to capture the distance between the row-column associations in the observed contingency table and those generated by random matching patterns (see Methods). Its value is equal to zero if mating choice is random and increases with the degree of nonrandom row-column association. The estimate of the Altham index with the full sample is 2.063, which demonstrates a significant correlation in spousal blood types.
Because of possible geographical heterogeneity of blood type distribution, we then restricted our analysis to local subsamples (couples who were born in the same region and received a prepregnancy checkup in this region) and performed a meta-analysis on assortative mating to investigate whether the assortative mating pattern still exists after controlling for possible subpopulation structure and whether this pattern is universal among different areas in China (15). Population stratification is one of the key drivers of spousal concordance that is independent of individuals' mate choice. Individuals from subpopulations with residential proximity may naturally have more opportunities to mate. Moreover, they usually share similarities in their blood types because of their similar ethnic backgrounds. Without controlling for population stratification, we may overestimate the degree of assortative mating. Local subsample analysis is therefore proposed to relieve the estimation bias of assortative mating induced by population stratification. We identified locally matched observations by birthplace information and living area information provided by Chinese prepregnancy checkup data. Meta-analysis is carried out at city level (that is, we evaluate assortative mating within a city). We report statistical results for the 16 cities whose sample sizes (i.e., number of pregnant couples) are larger than 10,000 (Table 2 and Fig. 1). In 13 of the 16 cities, Pearson's chisquare statistics indicate that mating choice is nonrandom at the 5% level of significance. Adjusted Pearson residuals further suggest that assortative mating on blood type is common in different areas, even after the subpopulation structure is controlled for. Bloodtype pairs (O, O) typically have higher adjusted Pearson residuals than other blood type pairs, and thus show a higher degree of assortative mating. Also, the values of adjusted Pearson residuals we obtained from locally matched samples in each city are generally lower than those obtained from the national full sample. A similar pattern is observed in the forest plot, in which 12 of the 16 cities have an effect size, i.e., Cramer's V, larger than 0.03 and seven among them have an effect size greater than 0.05 (Fig. 1). The fixed-effects estimate of Cramer's V over the 16 subsamples is 0.046.
Meta-analysis helps to eliminate the possible effects of subpopulation structure on assortative mating. However, another possible mechanism, relationship maintenance, may also confound estimation of assortative mating. The similarity of spousal pairs' blood types could be explained by the concordance of living with a partner with the same blood type. As some studies have indicated, spousal pairs with similar phenotypes, such as alcohol consumption, could be more likely to remain in a relationship (26)(27)(28). If blood type is associated with some phenotypes, then couples with the same blood type might live in concord with each other because of the similarities they share and enroll in the prepregnancy checkup dataset, because their relationship has a lower probability of breaking up before they marry. Therefore, we perform regression analysis to isolate the effect of assortative mating on spousal concordance on blood type from that of relationship maintenance as well as that of subpopulation structure. The regressions will help us quantitatively evaluate how much variation in an individual's blood type can be explained by assortative mating and the other two alternative mechanisms-population stratification and relationship maintenance. Specifically, we regress an individual's blood type on their partner's blood type and incorporate a group of control variables in the regression, which includes the share of the individual's blood type in the population in their birthplace, the share in their ethnicity, the length of the marriage, and its interaction term with the partner's blood type (see Methods for details). For comparison, we also run regressions without controlling for other factors.
The results are reported in Tables 3 and 4. In the odd-numbered columns, positive coefficients of indicator variables that show whether the individual and her partner have the same blood type offer evidence for assortative mating on blood type. After incorporating control variables, as reported in even-numbered columns, the magnitudes of indicator variables' coefficients decline considerably, and most control variables show statistical significance; this validates that estimates of assortative mating can be biased by confounding factors such as population stratification and relationship maintenance. It can be further seen from the decline in estimates of indicator variables' coefficients after incorporation of control variables that approximately 30 to 40% of spousal concordance on blood type observed in the odd-numbered columns in Tables 3 and 4 is attributed to confounders, while the remaining portion of spousal similarities can be ascribed to assortative mating. After incorporation of a group of control variables, the coefficients of the partner's blood type are still statistically significant at the 1% level, which indicates highly nonrandom matching on blood type. The coefficients of variables with statistical significance represent the extent to which the corresponding mechanism can explain the individual's blood type. By decomposing spousal concordance on blood type into several mechanisms, even-numbered columns in Tables 3 and 4 reflect the magnitude of different mechanisms' effects on spousal similarities observed in data. The results show that both subpopulation structure and assortative mating play key roles in explaining the observed pattern, while relationship maintenance fails to be a convincing explanation. One's blood type is mostly explained by the distribution of blood types in the population of her birthplace. A 1% rise in the share increases the odds ratio of having the corresponding blood type by approximately 4 to 5% for individuals with type A, B, or O blood and by approximately 12% for those with type-AB blood. In addition to local are the share of type g blood individuals in the population of the birthplace of the wife or the husband in couple i. population structure, the individual's ethnicity explains a considerable fraction of her blood type. A 1% increase in the share of her blood type in her ethnicity increases the odds ratio of having the corresponding blood type by approximately 1 to 2% for those with type A, B, or O blood and by approximately 5% for those with type-AB blood. Her partner's blood type partner is also an effective predictor of her blood type, which provides evidence for assortative mating. If her partner has a given blood type, the odds ratio of having the same blood type will increase by approximately 8% for those with type-A blood, 7% for those with type-B blood, 15% for those with type-AB blood, and 20% for those with type-O blood (Tables 3 and 4). More assortative mating is observed within spousal pairs with type-AB blood or type-O blood. The almost equal estimates in Tables 3 and 4 suggest that the degrees of assortative mating are similar between females and males. We did not find strong evidence to support the argument that the similarity of spousal pairs' blood types is associated with length of marriage, which is used as a proxy for relationship maintenance: The coefficients of the interaction terms of length of marriage and the indicator variables for partner's blood type are insignificant or of small magnitude (lower than 0.001). The estimated increase in the odds ratio roughly indicates the degree of assortative mating on blood type. To estimate the increase in the probability of matching between individuals that are the share of type g blood individuals in the population of the birthplace of the wife or the husband in couple i. can be explicitly attributed to blood type assortative mating, we repeated the above regression analysis using a linear regression model and report the results in Tables 5 and 6. Like the results obtained by logistic regression analysis, individuals with the same blood type have a significantly higher probability of matching with each other than those with different blood types. The results are robust after controlling for a group of control variables. If one's partner is of a specific blood type, the probability of having the same blood type will increase by approximately 1.6% for those with type A blood, 1.3% for those with type B blood, 1.2% for those with type AB blood, and 4.4% for those with type O blood (Tables 5 and 6). Again, linear regression analysis provides clear evidence to support assortative mating on blood type.

Possible Reasons for Assortative Mating on Blood Type
Having shown robust evidence for assortative mating on blood type, we investigate potential reasons. One possible explanation is that blood type may act as a proxy for other phenotypes. As previously stated, many studies have validated assortative mating on a group of phenotypes, such as BMI, weight, height, and IQ (10,(16)(17)(18)(19)(20)(21). That is, individuals tend to choose a partner who shares similarities along these dimensions when making mate choices. If blood type is associated with these phenotypes, spousal concordance on blood type will be observed because of assortative mating. Using personal information provided by the dataset, we are the share of type g blood individuals in the population of the birthplace of the wife or the husband in couple i. examine bivariate correlation between blood type and other phenotypes (Fig. 2). There appear to be some associations between blood type and the phenotypes we examine: education, job type, height, weight, pressure, and drinking habits. However, most associations have a relatively small correlation coefficient lying between −0.03 and 0.03.
To further explore to what extent assortative mating on blood type can be explained by its correlation with other phenotypes, we performed mediation analysis. Specifically, we first regressed the individual's blood type on her partner's using a logistic regression model, then incorporated a mediator-i.e., one of the partner's phenotypes that might be associated with his blood type-to see whether and to what degree the effect of the partner's blood type on the individual's blood type is weakened after the mediator is included in the regression. We report the results of mediation analysis in Tables 7 and 8. As can be seen, the coefficients of the partner's blood type decline after we included different mediators in the regression models, which shows that the associations between blood type and other phenotypes can explain assortative mating on blood type to a certain degree. We see from columns 2 to 9 in Table 7 and column 1 in Table 8 that the proportion of the coefficients of the partner's blood type absorbed by mediators varies with blood type. For individuals with type B blood, when all mediators are included, the coefficients of the partner's blood type are reduced by around 6 to 7%; for those with type A blood, the incorporation of mediators has little effect on estimation   are the share of type g blood individuals in the population of the birthplace of the wife or the husband in couple i. pnas.org results for coefficients of the partner's blood type, as shown in column 1 in Table 8. As for those with type AB blood or type O blood, the scale of mediator absorption is about 3 to 4%. However, a large fraction of assortativity remains unexplained. When we further included a group of control variables to isolate our measure of assortative mating from confounding factors-such as population stratification, province-level fixed effects, or even the individual's phenotypes-in the regression, as indicated by the statistical significance of the coefficients of the partner's blood type in columns 2 to 4 in Table 8, we still found strong evidence for assortative mating on blood type. These findings suggest that there could be other potential mechanisms for this pattern we observe in the data. Further investigation into this is left for future research.

Conclusion
In summary, we provide evidence of assortative mating on blood type. The degree of assortative mating varies among individuals with different blood types and across locations. Our findings are robust after we control for other possible mechanisms, such as environmental confounding and relationship maintenance. We further examine potential mechanisms for these observations Our study makes two contributions. First, our empirical results shed light on nonrandom matching on blood type-one of the most well-known human phenotypes-which, to the best of our knowledge, has not been fully investigated. Assortative mating on blood type will have important genetic consequences by influencing the direction of the evolution of blood type distribution in the population. On the one hand, it will intensify trait divergence compared with random mating, and thus heighten the population's response to directional natural selection (29,30). On the other hand, it reduces the heterozygosity of the population's blood type and may promote inbreeding depression (31). These evolutionary effects render it important to investigate this issue.
Second, we improve the causal inference of assortative mating using a group of approaches. To mitigate the estimation bias caused by population stratification, we restrict our analysis to locally matched subsamples to perform meta-analysis. We further address this concern by running regressions with control variables. Other possible mechanisms are also controlled for in our analysis. Our robust results show causal evidence for assortative mating on blood type.
We acknowledge the limitations of our study. First, couples who are lost to follow-up or fail to become pregnant after the prepregnancy examination are missing from our dataset. Selection into the sample introduced by the two kinds of prescreening could be (but does not appear to be) a potential source of bias. Second, although overall evidence for assortative mating was found and some potential mechanisms behind the pattern were examined, a large fraction of the assortative mating we observe from the data remains unexplained. It would be of interest to understand the underlying mechanisms in future research. Third, we should be cautious when extrapolating findings in the Chinese population to other populations. Further evidence is needed to investigate whether the nonrandom matching pattern of blood types is also robust in other populations.
Finally, we sought to avoid our estimates of the degree of assortative mating from being confounded by other factors (6,21,28,(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42), such as population stratification and relationship maintenance after spousal pairing. We acknowledge that it is difficult to infer a causal relationship between blood type similarity and  mate choice. As previously noted, when a latent subpopulation structure underlies the observed sample, the positive association of blood type within spousal pairs can be attributed to systematic differences among subpopulations that arise from the ancestral differences they inherit. Under this circumstance, assortative mating will thus be overestimated if homogeneous sampling is assumed. An alternative explanation for the similarity of blood types within spousal pairs is that blood type concordance helps spousal pairs maintain their relationship. Partners with the same blood type might share similar phenotypes, and thus find it easier to get along and maintain their relationship, which increases their probability of being enrolled in the study and leads to biased estimation. All three possible mechanisms may result in spousal concordance on blood type (Fig. 3). We strove to control for confounding factors by incorporating a group of control variables to indicate the blood type distribution of subpopulations and the length of couples' relationship when estimating the extent of assortative mating on blood type using regression models. However, there might still be concern regarding whether confounding factors have been effectively ruled out and a causal relationship between blood type concordance and mate choice is clearly identified. To further study assortative mating on blood type in future research, we will need to carefully distinguish this mechanism from the other two possible explanations.

Data.
Chinese prepregnancy checkup data. In this study, we used 2014 to 2015 Chinese prepregnancy checkup data to perform statistical analysis. Our dataset is from the National Free Preconception Health Examination Project (NFPHEP), which was launched by the National Health Commission (NHC) and the Ministry of Finance of the People's Republic of China and offers free prepregnancy examinations for low-income married couples in urban areas and married couples in rural areas who plan to get pregnant in the next 6 mo. The nationwide project covers married couples with pregnancy plans in 2,907 counties from all 31 provincial administrative regions of Mainland China. Aiming to reduce birth defects and improve the health of newborns, China's NHC strives to expand coverage of the survey with the aid of local communities and family planning service agencies. According to China's NHC, the survey covered over 95% of the targeted population between 2014 and 2015 (43,44). Couples who had made plans for pregnancy were enrolled by local community staff and given prepregnancy examinations. Blood samples are collected during the examination and sent to local laboratories for blood type testing. In addition to the results of physical examinations, the NFPHEP also collects all participants' basic personal information-e.g., sex, address of residence, ethnicity, birthplace, and marriage-via a standardized questionnaire as well as participants' identification cards (45)(46)(47). Two follow-up telephone interviews are conducted after the examination by trained nurses. The first is performed within 3 mo after the examination to check pregnancy status and the second is carried out within 1 y after the first follow-up interview to track the pregnancy outcome. After excluding couples who are lost to follow-up or fail to become pregnant after the prepregnancy examination, 1,137,010 couples are included in the dataset. Informed consent is signed by all project participants before enrollment. This study was approved by NHC. Subsample with complete personal information. We removed observations with incomplete information related to the couple's blood types, living areas, birthplaces, ethnicity, marriage, education, job type, body height, weight, pressure, and drinking habits, which we used in our statistical analysis, and obtained a sample of 931,964 couples for the full sample of our study. The raw dataset of Chinese prepregnancy checkup data was filtered in Stata.

Contingency table.
We broke down the numbers of matches between individuals with different blood types by sex to produce a 4 × 4 contingency table, with the number in grid (i, j) N i,j representing the frequency of matching between males with type i blood and females with type j blood. The ratios of observed frequencies over expected frequencies are also reported, with gird (i, j) computed where I g Husband and I g Wife refer to indicator variables that show whether the wife or the husband has type g blood, and P I g Wife = 1, I h Husband = 1 represents the probability of matching between females with type g blood and males with type h blood, which is estimated by the share of this type of matching in the full sample. The higher the index, the more nonrandom the mate choice; it is equal to zero when matching is random. The Altham index was computed using R. Meta-analysis. As a sensitivity analysis, we restricted samples for Pearson's chisquare test to locally matched couples to prevent our test on assortative mating from being confounded by population stratification. Specifically, we covered couples who were born in the same birthplace and received prepregnancy checkups in this area and stratify them by their birthplace. Meta-analysis is performed at the city level, which allows more granular segmentation for 16 locally matched subsamples. The meta-analysis results of chi-square test and effect size (Cramer's V) estimation among subsamples are reported through a summary table and a forest plot. The SE and 95% CI of Cramer's V are estimated through a bootstrap method with 1,000 replications. A fixed-effects meta-analysis model is further utilized to estimate the average true effect size of assortative mating on blood type over the 16 cities pop . The model estimates it by computing the weighted average of true city-specific effect sizes j (j = 1, 2, . . . , 16) that have been estimated y the proceeding meta-analysis, in which the weight assigned to each subsample W j is equal to the inverse of the error variance of the subsample's Cramer's V over the sum of the error variance's   Table 7. Column 2 reports results of the logistic model that incorporates all mediators examined in columns 2 to 9 of Table 7 and additional control variables, including the share of the individual's blood type in the population living in her birthplace, the share in their ethnicity, length of marriage, and its interaction term with the partner's blood type. Column 3 shows results of logistic models with incorporation of all mediators examined in column 2 to 9 of Table 7, control variables specified in the model used in column 2 as well as provincial-level fixed effects. Finally, column 4 adopts a logistic model that adds the individual's phenotypes that are examined in column 2 to 9 of Table 7, into the model used in column 3 (see Methods for details). The degree of pressure felt 0 to 4 The degree of economic pressure felt 0 to 4 How often the respondent drinks 0 to 2 inverses among subsamples. The full sample was filtered in Stata to obtain locally matched subsamples. The bootstrap estimation of the SE and 95% CI of Cramer's V among subsamples was also processed in Stata. The rest parts of meta-analysis of subsamples were performed using R.
Regression analysis. We investigated the degree of assortative mating on blood type with logistic regression models. The models are specified as Eqs. 3 to 6, where p g Wife,i and p g Husband,i indicate the probability that the wife or the husband of couple i has type g blood. I g Husband,i and I g Wife,i refer to indicator variables that show whether the wife or the husband of couple i has type g blood. BS g Wife,i and BS g Husband,i suggest the share of individuals with type g blood in the population of the birthplace of the wife or the husband in couple i, which is estimated by the population in this city covered in our full sample. EGS g Wife,i and EGS g Husband,i represent the share of individuals with type g blood in the population of the ethnicity of the wife or the husband in couple i, which is estimated by the population in this ethnicity covered in our full sample. Finally, Length of Marriage i indicates the time duration between when couple i get married and when they receive a prepregnancy checkup. I g Wife,i × Length of Marriage i and I g Husband,i × Length of Marriage i are interaction terms of the indicator variables and the length of marriage of this couple. We also repeated the above steps using linear regression model, to explicitly estimate the increase in probability of matching induced by assortative mating on blood type.

[6]
Besides the examination of the degree of assortative mating via logistic regression models, we also performed mediation analysis to explore potential mechanisms behind the observed pattern. Specifically, we included different mediators the regression models in Eqs. 3 and 5 and estimated the decline of the coefficients of the partner's blood type that can be attributed to the incorporation of mediators, which is shown as Eqs. 7 and 8.
[7] [8] where Mediator Wife,i and Mediator Husband,i measure the phenotype of the wife and the husband of couple i, respectively. The mediators we employed are education, whether the individual is a public employee, whether they work as a peasant, their height and weight, the pressure they feel, the economic pressure they feel, and their drinking habit. For all models, we adopted robust SEs in the model estimation to ensure that our estimates are robust when the model specification is incorrect. Regression analysis was performed using Stata. Data, Materials, and Software Availability. We used 2014 to 2015 Chinese prepregnancy checkup data available from the Institute of Science and Technology of the NHC of the People's Republic of China. Data and code have been deposited in https://cloud.tsinghua.edu.cn/d/e86e227d8e66475ba790/ (48).